A hybrid modeling framework for generalizable and interpretable predictions of ICU mortality across multiple hospitals

The development of reliable mortality risk stratification models is an active research area in computational healthcare. Mortality risk stratification provides a standard to assist physicians in evaluating a patient’s condition or prognosis objectively. Particular interest lies in methods that are transparent to clinical interpretation and that retain predictive power once validated across diverse datasets they were not trained on. This study addresses the challenge of consolidating numerous ICD codes for predictive modeling of ICU mortality, employing a hybrid modeling approach that integrates mechanistic, clinical knowledge with mathematical and machine learning models . A tree-structured network connecting independent modules that carry clinical meaning is implemented for interpretability. Our training strategy utilizes graph-theoretic methods for data analysis, aiming to identify the functions of individual black-box modules within the tree-structured network by harnessing solutions from specific max-cut problems. The trained model is then validated on external datasets from different hospitals, demonstrating successful generalization capabilities, particularly in binary-feature datasets where label assessment involves extrapolation.

To contrast our developed hybrid modeling framework for predicting ICU patient mortality with existing ICU mortality prediction models, we employed the Sequential Organ Failure Assessment (SOFA) score 2 , excluding the Glasgow Coma Scale (GCS) score and urine output due to data limitations.The selected physiological parameters used for calculating the SOFA score, obtained within 24 hours of ICU admission, are detailed in Supplementary Table S1.S1.Physiological parameters required for calculating the SOFA score (obtained within 24h of ICU admission).Values are represented as mean (standard deviation).To assess the effectiveness of a mortality prediction model utilizing SOFA scores, we employed logistic regression.This statistical methodology leverages the inherent association between SOFA scores and the likelihood of ICU mortality.

Derivation Hospital
Supplementary Figure S1 depicts receiver operating characteristic (ROC) curves illustrating the discriminative performance of the logistic regression model across multiple hospitals.Each curve corresponds to a different hospital, with the Area Under the Curve (AUC) serving as a metric for the model's discriminatory ability.Despite the rationale behind employing the SOFA score, the AUC values reveal suboptimal discriminative performance across all hospitals.Supplementary Figure S1.ROC curves illustrating the discriminative performance of a logistic regression model utilizing SOFA scores to predict patient ICU mortality across multiple hospitals.Each curve represents a different hospital, with the AUC quantifying the model's discriminatory ability.The proximity of the AUCs to that of a random classifier suggests poor discriminative performance across all hospitals.
The deliberate exclusion of specific potential causes of mortality, such as heart failure, from the tree-structured network depicted in Figure 2 in the main manuscript aimed to enhance the precision and relevance of our study on mortality prediction among critically ill ICU patients particularly within the five German hospitals involved in this study.Despite the acknowledged significance of heart failure in the broader medical context, the incorporation of a heart failure module into our first-layer modules revealed limited discriminative value specific to our dataset.
Supplementary Table S2 provides an overview of heart failure-related features, encompassing aspects from patient medical history and those reflected in the ICD codes of the studied patient cohorts across the five German hospitals involved in this study.S2 In our exploration of the discriminability of binary heart failure-related features highlighted in Supplementary Table S2, a crucial metric employed is the Point-Biserial Correlation Coefficient r pb .This metric quantifies the strength and direction of the association between a binary feature and a target variable, in our case, mortality of the ICU patients.

Supplementary Table
Supplementary Table S3 provides the representation of r pb for binary heart failure-related features and their correlation with patient mortality, categorized by hospital.Interpretation of this coefficient involves recognizing that a value close to 0 suggests a weak association, while values close to 1 or -1 indicate a strong association.Additionally, the P value accompanying the coefficient denotes the statistical significance of the association; a low correlation coefficient with a high P value suggests a weak and possibly non-discriminative association.S3.The representation of r pb for binary heart failure-related features and their correlation with patient mortality, categorized by hospital.A value of r pb close to 0 suggests a weak association, while values close to 1 or -1 indicate a strong association.P values < 0.05 indicate significant difference in the mean mortality for the two groups defined by the binary heart failure-related feature.The calculation of the P value linked with the r pb involves a statistical hypothesis test.The null hypothesis posits no correlation between the binary heart failure-related feature and mortality, while the alternative hypothesis suggests a correlation.Specifically, a t-test is employed, assuming that the mean mortality for the two groups defined by the binary heart failure-related feature is either the same (null hypothesis) or different (alternative hypothesis).

Heart Failure Related Features
To assess and compare the association of heart failure-related features with ICU mortality against the association of the binray features used in our model with ICU mortality, we provide the relevant r pb and the associated significance of the features used in our model in Supplementary Table S4.The results reveal that the majority of features exhibit high absolute values of r pb across all hospitals, underscoring their substantial discriminative power in stratifying mortality.

6/10
Supplementary Table S4.The representation of r pb for binary features used in our model and their correlation with patient mortality, categorized by hospital.A value of r pb close to 0 suggests a weak association, while values close to 1 or -1 indicate a strong association.P values < 0.05 indicate significant difference in the mean mortality for the two groups defined by the binary features used in our model.
. Heart failure related features of the studied patient cohorts from five German hospitals.Variable distributions are reported as n (%).